System and method to support single instance storage operations

ABSTRACT

Systems and methods for single instance storage operations are provided. Systems constructed in accordance with the principals of the present invention may process data containing a payload and associated metadata. Often, chunks of data are copied to traditional archive storage wherein some or all of the chunk, including the payload and associated metadata are copied to the physical archive storage medium. In some embodiments, chunks of data are designated for storage in single instance storage devices. The system may remove the encapsulation from the chunk and may copy the chunk payload to a single instance storage device. The single instance storage device may return a signature or other identifier for items copied from the chunk payload. The metadata associated with the chunk may be maintained in separate storage and may track the association between the logical identifiers and the signatures for the individual items of the chunk payload which may be generated by the single instance storage device.

PRIORITY CLAIM

This application claims the benefit of U.S. provisional application No.60/626,076 titled SYSTEM AND METHOD FOR PERFORMING STORAGE OPERATIONS INA COMPUTER NETWORK, filed Nov. 8, 2004, and U.S. provisional applicationNo. 60/625,746 titled STORAGE MANAGEMENT SYSTEM filed Nov. 5, 2004, eachof which is incorporated herein by reference in its entirety.

RELATED APPLICATIONS

This application is related to the following patents and pendingapplications, each of which is hereby incorporated herein by referencein its entirety:

application Ser. No. 09/354,058, titled HIERARCHICAL BACKUP ANDRETRIEVAL SYSTEM, filed Jul. 15, 1999, attorney docket number 4982/5;

U.S. Pat. No. 6,418,478, titled PIPELINED HIGH SPEED DATA TRANSFERMECHANISM, issued Jul. 9, 2002, attorney docket number 4982/6;

application Ser. No. 60/460,234, SYSTEM AND METHOD FOR PERFORMINGSTORAGE OPERATIONS IN A COMPUTER NETWORK, filed Apr. 3, 2003, attorneydocket number 4982/35;

application Ser. No. 60/482,305, HIERARCHICAL SYSTEM AND METHOD FORPERFORMING STORAGE OPERATIONS IN A COMPUTER NETWORK, filed Jun. 25,2003, attorney docket number 4982/39;

Application Ser. No. 60/519,526, SYSTEM AND METHOD FOR PERFORMINGPIPELINED STORAGE OPERATIONS IN A COMPUTER NETWORK, filed Nov. 13, 2003,attorney docket number 4982/46P;

application Ser. No. 10/803,542, METHOD AND SYSTEM FOR TRANSFERRING DATAIN A STORAGE OPERATION, filed Mar. 18, 2004, attorney docket number4982/49;

Application Serial Number to be assigned, titled SYSTEM AND METHOD FORPERFORMING MULTISTREAM STORAGE OPERATIONS, filed Nov. 7, 2005, attorneydocket number 4982-59;

Application Serial Number to be assigned, titled METHOD AND SYSTEM OFPOOLING STORAGE DEVICES, filed Nov. 7, 2005, attorney docket number4982-61;

Application Serial Number to be assigned, titled METHOD AND SYSTEM FORSELECTIVELY DELETING STORED DATA, filed Nov. 7, 2005, attorney docketnumber 4982-67;

Application Serial Number to be assigned, titled METHOD AND SYSTEM FORGROUPING STORAGE SYSTEM COMPONENTS, filed Nov. 7, 2005, attorney docketnumber 4982-69;

Application Serial Number to be assigned, titled SYSTEMS AND METHODS FORRECOVERING ELECTRONIC INFORMATION FROM A STORAGE MEDIUM, filed Nov. 7,2005, attorney docket number 4982-68; and

Application Serial Number to be assigned, titled METHOD AND SYSTEM FORMONITORING A STORAGE NETWORK, filed Nov. 7, 2005, attorney docket number4982-66.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent files or records, but otherwise reserves all copyrightrights whatsoever.

BACKGROUND OF THE INVENTION

The invention disclosed herein relates generally to performing storageoperations in a computer network. More particularly, the presentinvention relates to systems and methods for supporting single instancestorage devices in a computer network.

Storage of electronic data has evolved through many forms. During theearly development of the computer, storage of data was limited toindividual computers. Electronic data was stored in the Random AccessMemory (RAM) or some other storage medium such as a hard drive or tapedrive that was an actual part of the individual computer.

Later, with the advent of networked computing, storage of electronicdata gradually migrated from the individual computer to stand-alonestorage devices and other storage devices accessible via a network, forexample a tape library accessible via a network server or othercomputing device. These network storage devices soon evolved in the formof networked tape drives, libraries, optical libraries, Redundant Arraysof Inexpensive Disks (RAID), CD-ROM jukeboxes, and other devices. Systemadministrators often use network storage devices to perform storageoperations and make backup copies and other copies of data stored onindividual client computers in order to preserve data against accidentalloss, corruption, physical damage, and other risks.

Storage systems evolved to handle increasingly complex storageoperations and increasingly large volumes of data. For example, somestorage management systems began organizing system components and systemresources into logical groupings and hierarchies such as storageoperation cells of the CommVault QiNetix storage management system,available from CommVault Systems, Inc. of Oceanport, N.J., and asfurther described as further described in Application Ser. No.60/482,305 and application Ser. No. 09/354,058 which are herebyincorporated by reference in their entirety.

Another factor contributing to increasingly large volumes of data isstorage of multiple copies of the same file or data item. For example, alarge enterprise might have several hundred users each keeping a copy ofthe same e-mail attachment. Alternatively, individual users may alsolose track of or otherwise retain several copies of a file on their ownpersonal hard drive or network share. Thus, storage space on systems isbeing wasted by multiple instances of the same data.

To address this problem, companies have developed storage devices thatsupport single instance storage. Data items copies to a single instancestorage device are processed to determine a unique signature for eachfile. Thus, copies or instances of the same file will generate the sameunique signature. One well known technique for generating such asignature is generating a cryptographic hash of the file or othersimilar checksum based on the file contents. Storage devices can thencompare the signature for a file to be stored with a list of previouslystored signatures to determine whether a copy of the file already existsin storage and thus the file need not be copied again. Some storagesystems also use content addressable storage (“CAS”) in single instancestorage devices in which the signature or hash of the file is also usedas the address of the file in the storage device.

One problem associated with single instance storage solutions is thatthey are not designed to process backup data stored as chunks. When acopy of a production data store or other large volume of data is made,the data is often divided into a number of smaller parts for easiertransmission to archive media via the network. These smaller partstypically become encapsulated as the payload for chunks of data whichinclude metadata, such as tag headers and footers as previouslydescribed in U.S. application Ser. No. 10/803,542 and U.S. Pat. No.6,418,478 each of which is hereby incorporated by reference in itsentirety, and the chunks of data are sent over the network to thearchive storage. For example, each chunk may contain a payload ofseveral thousand files from a larger data store containing hundreds ofthousands of files with each file or item having a logical identifiersuch as a filename. Metadata for each chunk describing the contents ofthe payload (logical identifiers, etc.) and other information may bestored along with the payload data as further described herein. Inaddition, the metadata from each chunk may be used by system componentsto track the content of the payload of each chunk and also containsstorage preferences and other information useful for performing storageoperations.

Each chunk of data, however, usually contains different metadata. Thus,two instances of the same file will likely be encapsulated by differentmetadata in two different chunks eve if the payload of the two chunks isthe same. Similarly, current single instance storage systems wouldgenerate different signatures for each chunk of data and therefore storea different copy of each chunk of data even though the payload of eachchunk is the same.

BRIEF SUMMARY OF THE INVENTION

Systems and methods for single instance storage operations are provided.Systems constructed in accordance with the principals of the presentinvention may process data containing a payload and associated metadata.Often, chunks of data are copied to traditional archive storage whereinsome or all of the chunk, including the payload and associated metadataare copied to the physical archive storage medium. In some embodiments,chunks of data are designated for storage in single instance storagedevices. The system may remove the encapsulation from the chunk and maycopy the chunk payload to a single instance storage device. The singleinstance storage device may return a signature or other identifier foritems copied from the chunk payload. The metadata associated with thechunk may be maintained in separate storage and may track theassociation between the logical identifiers and the signatures for theindividual items of the chunk payload which may be generated by thesingle instance storage device.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated in the figures of the accompanying drawingswhich are meant to be exemplary and not limiting, in which likereferences are intended to refer to like or corresponding parts, and inwhich:

FIG. 1 is a block diagram of a storage operation cell in a system toperform storage operations on electronic data in a computer networkaccording to an embodiment of the invention;

FIG. 2 is a block diagram of a hierarchically organized group of storageoperation cells in a system to perform storage operations on electronicdata in a computer network according to an embodiment of the invention;

FIG. 3 is a block diagram of a hierarchically organized group of storageoperation cells in a system to perform storage operations on electronicdata in a computer network according to an embodiment of the invention;

FIG. 4 is a flow diagram of a method to store chunks of data in a singleinstance storage device.

FIG. 5 is a flow diagram of a method for retrieving chunk payload datafrom a single instance storage device.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

With reference to FIGS. 1 through 5, embodiments of the invention arepresented. Systems and methods are presented for performing multi-streamstorage operations including multi-stream storage operations associatedwith a single sub-client.

FIG. 1 presents a block diagram of a storage operation cell in a systemto perform storage operations on electronic data in a computer networkaccording to an embodiment of the invention. As shown, the storageoperation cell may include a storage management component, such asstorage manager 100 and one or more of the following: a client 85, adata store 90, a data agent 95, a media management component, such as amedia agent 125, a media management component index cache 130, a storagedevice 135, a storage management component index cache 105, a jobs agent110, an interface module 115, and a management agent 120. The system andelements thereof are exemplary of a modular storage management systemsuch as the CommVault QiNetix storage management system, available fromCommVault Systems, Inc. of Oceanport, N.J., and further described inapplication Ser. No. 09/610,738 which is incorporated herein byreference in its entirety.

A storage operation cell generally includes combinations of hardware andsoftware components directed to performing storage operations onelectronic data. Exemplary storage operation cells according toembodiments of the invention include CommCells as embodied in the QNetstorage management system and the QiNetix storage management system byCommVault Systems of Oceanport, N.J., and as further described inApplication Ser. No. 60/482,305 and application Ser. No. 09/354,058which are hereby incorporated by reference in their entirety.

According to some embodiments of the invention, storage operations cellsare related to backup cells and provide all of the functionality ofbackup cells as further described in application Ser. No. 09/354,058.Storage operation cells also perform additional types of storageoperations and provide other types of storage management functionality.According to embodiments of the invention, storage operation cellsperform storage operations which also include, but are not limited to,creation, storage, retrieval, migration, deletion, and tracking ofprimary or production volume data, secondary volume data, primarycopies, secondary copies, auxiliary copies, snapshot copies, backupcopies, incremental copies, differential copies, HSM copies, archivecopies, Information Lifecycle Management (“ILM”) copies, and other typesof copies and versions of electronic data. In some embodiments, storageoperation cells also provide an integrated management console for usersor system processes to interface with to perform storage operations onelectronic data as further described herein.

A storage operation cell can be organized and associated with otherstorage operation cells forming a logical hierarchy among variouscomponents of a storage management system as further described herein.Storage operation cells generally include a storage manager 100, and,according to some embodiments, one or more other components including,but not limited to, a client computer 85, a data agent 95, a mediamanagement component 125, a storage device 135, and other components asfurther described herein.

For example, a storage operation cell may contain a data agent 95 whichis generally a software module that is generally responsible forperforming storage operations related to client computer 85 data storedin an data store 90 or other memory location, for example archiving,migrating, and recovering client computer data. In some embodiments, adata agent performs storage operations in accordance with one or morestorage policies or other preferences. A storage policy is generally adata structure or other information that may include a set ofpreferences and other storage criteria for performing a storageoperation. The preferences and storage criteria may include, but are notlimited to: a storage location, relationships between system components,network pathway to utilize, retention policies, data characteristics,compression or encryption requirements, preferred system components toutilize in a storage operation, and other criteria relating to a storageoperation. As further described herein, storage policies may be storedto a storage manager index, to archive media as metadata for use inrestore operations or other storage operations, or to other locations orcomponents of the system.

Each client computer 85 generally has at least one data agent 95 and thesystem can support many client computers 85. The system also generallyprovides a plurality of data agents 95 each of which is intended toperform storage operations related to data associated with a differentapplication, for example to backup, migrate, and recover applicationspecific data. For example, different individual data agents 95 may bedesigned to handle Microsoft Exchange data, Lotus Notes data, MicrosoftWindows 2000 file system data, Microsoft Active Directory Objects data,and other types of data known in the art.

If a client computer 85 has two or more types of data, one data agent 95is generally required for each data type to perform storage operationsrelated to client computer 85 data. For example, to backup, migrate, andrestore all of the data on a Microsoft Exchange 2000 server, the clientcomputer 85 would use one Microsoft Exchange 2000 Mailbox data agent 95to backup the Exchange 2000 mailboxes, one Microsoft Exchange 2000Database data agent 95 to backup the Exchange 2000 databases, oneMicrosoft Exchange 2000 Public Folder data agent 95 to backup theExchange 2000 Public Folders, and one Microsoft Windows 2000 File Systemdata agent 95 to backup the client computer's 85 file system. These dataagents 95 would be treated as four separate data agents 95 by the systemeven though they reside on the same client computer 85. In someembodiments, separate data agents may be combined to form a virtual dataagent (not shown) for performing storage operations related to aspecific application. Thus, the four separate data agents of theprevious example could be combined as a virtual data agent suitable forperforming storage operations related to all types of Microsoft Exchange2000 and/or Windows 2000 data.

The storage manager 100 is generally a software module or applicationthat coordinates and controls storage operations performed by thestorage operation cell. The storage manager 100 communicates with allelements of the storage operation cell including client computers 85,data agents 95, media management components 125, and storage devices 135regarding storage operations, for example to initiate and manage systembackups, migrations, and recoveries. The storage manager 100 alsocommunicates with other storage operation cells as further describedherein.

The storage manager 100 includes a jobs agent 110 software module whichmonitors the status of all storage operations that have been performed,that are being performed, or that are scheduled to be performed by thestorage operation cell. The jobs agent 110 is communicatively coupledwith an interface agent 115 software module. The interface agent 115provides presentation logic, such as a graphical user interface (“GUI”),an application program interface (“API), or other interface by whichusers and system processes can retrieve information about the status ofstorage operations and issue instructions to the storage operations cellregarding performance of storage operations as further described herein.For example, a user might modify the schedule of a number of pendingsnapshot copies or other types of copies. As another example, a usermight use the GUI to view the status of all storage operations currentlypending in all storage operation cells or the status of particularcomponents in a storage operation cell.

The storage manager 100 also includes a management agent 120 softwaremodule. The management agent 120 generally provides an interface withother management components 100 in other storage operations cellsthrough which information and instructions regarding storage operationsmay be conveyed. For example, in some embodiments as further describedherein, a management agent 120 in first storage operation cell cancommunicate with a management agent 120 in a second storage operationcell regarding the status of storage operations in the second storageoperation cell. In some embodiments, a management agent 120 in firststorage operation cell can communicate with a management agent 120 in asecond storage operation cell to control the storage manager 100 (andother components) of the second storage operation cell via themanagement agent 120 contained in the storage manager 100 for the secondstorage operation cell. In other embodiments, the management agent 120in the first storage operation cell communicates directly with andcontrols the components in the second storage management cell andbypasses the storage manager 100 in the second storage management cell.Storage operation cells can thus be organized hierarchically among cellsand as further described herein.

A media management component 125 is generally a software module thatconducts data, as directed by a storage manager 100, between clientcomputers 85 and one or more storage devices 135. The media managementcomponent 125 is communicatively coupled with and generally configuredto control one or more storage devices 135. For example, the mediamanagement component 125 might instruct a storage device 135 to use arobotic arm or other means to load or eject a media cartridge, and toarchive, migrate, or restore application specific data. The mediamanagement component 125 generally communicates with storage devices 135via a local bus such as a SCSI adaptor. In some embodiments, the storagedevice 135 is communicatively coupled to the media management component125 via a Storage Area Network (“SAN”).

Each media management component 125 maintains an index cache 130 whichstores index data the system generates during storage operations asfurther described herein. For example, storage operations for MicrosoftExchange data generate index data. Index data may include, for example,information regarding the location of the stored data on a particularmedia, information regarding the content of the data stored such as filenames, sizes, creation dates, formats, application types, and otherfile-related criteria, information regarding one or more clientsassociated with the data stored, information regarding one or morestorage policies, storage criteria, or storage preferences associatedwith the data stored, compression information, retention-relatedinformation, encryption-related information, stream-related information,and other types of information. Index data thus provides the system withan efficient mechanism for performing storage operations includinglocating user files for recovery operations and for managing andtracking stored data. The system generally maintains two copies of theindex data regarding particular stored data. A first copy is generallystored with the data copied to a storage device 135. Thus, a tape maycontain the stored data as well as index information related to thestored data. In the event of a system restore, the index data storedwith the stored data can be used to rebuild a media management componentindex 130 or other index useful in performing and/or managing storageoperations. In addition, the media management component 125 thatcontrols the storage operation also may generally write an additionalcopy of the index data to its index cache 130. The data in the mediamanagement component index cache 130 is generally stored on fastermedia, such as magnetic media, and is thus readily available to thesystem for use in storage operations and other activities without havingto be first retrieved from the storage device 135.

The storage manager 100 may also maintain an index cache 105. Storagemanager index data may be, among other things, used to indicate, track,and associate logical relationships and associations between componentsof the system, user preferences, management tasks, and other usefuldata. For example, the storage manager 100 might use its index cache 105to track logical associations between media management components 125and storage devices 135. The storage manager 100 may also use indexcache 105 to track the status of storage operations to be performed,storage patterns associated with the system components such as mediause, storage growth, network bandwidth, service level agreement (“SLA”)compliance levels, data protection levels, storage policy information,storage criteria associated with user preferences, retention criteria,storage operation preferences, and other storage-related information.Index caches 105 and 130 typically reside on their corresponding storagecomponent's hard disk or other fixed storage device.

For example, jobs agent 110 of a storage manager component 100 mayretrieve storage manager index 105 data regarding a storage policy andstorage operation to be performed or scheduled for a particular client85. The jobs agent 110, either directly or via the interface module 115,communicates with the data agent 95 at the client 85 regarding thestorage operation. In some embodiments, the jobs agent 110 alsoretrieves from index cache 105 a storage policy associated with client85 and uses information from the storage policy to communicate to dataagent 95 one or more media management components 125 associated withperforming storage operations for that particular client 85 as well asother information regarding the storage operation to be performed suchas retention criteria, encryption criteria, streaming criteria, etc.Data agent may 95 then package or otherwise manipulate client datastored in client data store 90 in accordance with the storage policyinformation and/or according to a user preference, and may communicatethis client data to the appropriate media management component(s) 125for processing. Media management component(s) 125 may store the dataaccording to storage preferences associated with the storage policyincluding storing the generated index data with the stored data, as wellas storing a copy of the generated index data in the media managementcomponent index cache 130.

In some embodiments, components of the system may reside and execute onthe same computer. In some embodiments, a client computer 85 componentsuch as a data agent 95, a media management component 125, or a storagemanager 100 coordinates and directs storage operations as furtherdescribed in application Ser. No. 09/610,738. This client computer 85component can function independently or together with other similarclient computer 85 components.

FIG. 2 presents a block diagram of a hierarchically organized group ofstorage operation cells in a system to perform storage operations onelectronic data in a computer network according to an embodiment of theinvention. As shown, the system may include a master storage managercomponent 140, a first storage operation cell 145, a second storageoperation cell 150, a third storage operation cell 155, a fourth storageoperation cell 160, a fifth storage operation cell 165, and an nthstorage operation cell 170.

As previously described, storage operation cells are oftencommunicatively coupled and hierarchically organized. For example, asshown in FIG. 2, a master storage manager 140 is associated with,communicates with, and directs storage operations for a first storageoperation cell 145, a second storage operation cell 150, a third storageoperation cell 155, a fourth storage operation cell 160, a fifth storageoperation cell 165, and an nth storage operation cell 170. In someembodiments, the master storage manager 140 is not part of anyparticular storage operation cell. In other embodiments (not shown), themaster storage manager 140 may itself be part of a storage operationcell.

Thus, the master storage manager 140 communicates with the manager agentof the storage manager of the first storage operation cell 145 (ordirectly with the other components of the first cell 145) regardingstorage operations performed in the first storage operation cell 145.For example, in some embodiments, the master storage manager 140instructs the first storage operation cell 145 how and when to performstorage operations including the type of operation to perform and thedata on which to perform the operation.

In other embodiments, the master storage manager 140 tracks the statusof its associated storage operation cells, such as the status of jobs,system components, system resources, and other items, by communicatingwith manager agents (or other components) in the respective storageoperation cells. In other embodiments, the master storage manager 140tracks the status of its associated storage operation cells by receivingperiodic status updates from the manager agents (or other components) inthe respective cells regarding jobs, system components, systemresources, and other items. For example, in some embodiments, the masterstorage manager 140 uses methods to monitor network resources such asmapping network pathways and topologies to, among other things,physically monitor storage operations and suggest alternate routes forstoring data as further described herein. The master storage manager 140also uses methods to monitor primary and secondary storage trends,storage status, media usage, data protection levels, and otherstorage-related information as further described herein.

In some embodiments, the master storage manager 140 stores statusinformation and other information regarding its associated storageoperation cells and the system in an index cache or other data structureaccessible to the master storage manager 140. In some embodiments, asfurther described herein, the presentation interface of the masterstorage manager 140 accesses this information to present users andsystem processes with information regarding the status of storageoperations, storage operation cells, system components, and otherinformation of the system.

Storage operation cells may thus be organized hierarchically.Consequently, storage operation cells may inherit properties from“parent” or hierarchically superior cells or be controlled by otherstorage operation cells in the hierarchy. Thus, in some embodiments asshown in FIG. 2, the second storage operation cell 150 controls or isotherwise superior to the third storage operation cell 155, the fourthstorage operation cell 160, the fifth storage operation cell 165, andthe nth storage operation cell 170. Similarly, the fourth storageoperation cell 160 controls the fifth storage operation cell 165, andthe nth storage operation cell 170.

Storage operation cells may also be organized hierarchically accordingto criteria such as function (e.g., superior or subordinate), geography,architectural considerations, or other factors useful in performingstorage operations. For example, in one embodiment storage operationcells are organized according to types of storage operations: the firststorage operation cell 145 may be directed to performing snapshot copiesof primary copy data, and the second storage operation cell 150 may bedirected to performing backup copies of primary copy data or other data.In another embodiment, the first storage operation cell 145 mayrepresent a geographic segment of an enterprise, such as a Chicagooffice, and a second storage operation cell 150 represents a differentgeographic segment, such as a New York office. In this example, thesecond storage operation cell 150, the third storage operation cell 155,the fourth storage operation cell 160, the fifth storage operation cell165, and the nth storage operation cell 170 could represent departmentswithin the New York office. Alternatively, these storage operation cellscould be further divided by function performing various types of copiesfor the New York office or load balancing storage operations for the NewYork office.

In some embodiments, hierarchical organization of storage operationcells may facilitate, among other things, system security and otherconsiderations. For example, in some embodiments, only authorized usersare allowed to access or control certain storage operation cells. Forexample, a network administrator for an enterprise might have access toall storage operation cells including the master storage manager 140.But a network administrator for only the New York office, according to aprevious example, might only satisfy access criteria to have access tothe second storage operation cell 150, the third storage operation cell155, the fourth storage operation cell 160, the fifth storage operationcell 165, and the nth storage operation cell 170 which comprise the NewYork office storage management system.

In some embodiments, hierarchical organization of storage operationcells facilitates storage management planning and decision-making. Forexample, in some embodiments, a user of the master storage manager 140can view the status of all jobs in the associated storage operationcells of the system as well as the status of each component in everystorage operation cell of the system. The user can then plan and makedecisions based on this global data. For example, the user can viewhigh-level report of summary information regarding storage operationsfor the entire system, such as job completion status, componentavailability status, resource usage status (such as network pathways,etc.), and other information. The user can also drill down through menusor use other means to obtain more detailed information regarding aparticular storage operation cell or group of storage operation cells.

In other embodiments, the master storage manager 140 may alert a user orsystem administrator when a particular resource is unavailable (e.g.,temporary or permanent) or congested. A storage device may be full orrequire additional media. Alternatively, a storage manager in aparticular storage operation cell may be unavailable due to hardwarefailure, software problems, or other reasons. In some embodiments, themaster storage manager 140 (or another storage manager within thehierarchy of storage operation cells) may utilize the global dataregarding its associated storage operation cells at its disposal tosuggest solutions to such problems when they occur or even before theyoccur. For example, the master storage manager 140 might alert the userthat a storage device in a particular storage operation cell was full orotherwise congested, and then suggest, based on job and data storageinformation contained in its index cache, an alternate storage device.

As another example, in some embodiments the master storage manager 140(or other network storage manager) contains programming directed toanalyzing the storage patterns and resources of its associated storageoperation cells and which suggests optimal or alternate methods ofperforming storage operations. Thus, for example, the master storagemanager 140 may analyze traffic patterns to determine that snapshot datashould be sent via a different network segment or to a different storageoperation cell or storage device. In some embodiments, users can directspecific queries to the master storage manager 140 regarding predictingstorage operations or regarding storage operation information.

FIG. 3 presents a block diagram of a hierarchically organized group ofstorage operation cells in a system to perform storage operations onelectronic data in a computer network according to an embodiment of theinvention. As shown, FIG. 3 includes a first storage operation cell 175,a second storage operation cell 180, a third storage operation cell 185,a client 190 in communication with a primary volume 195 storingproduction or other “live” data, a storage manager component 200 incommunication with a storage manager index data store 205, a mediamanagement component 210 in communication with a media managementcomponent index 215 a secondary storage device or volume 220, and amaster storage manager component 225 in communication with a masterstorage manager index data store 230.

According to an embodiment of the invention, the first storage operationcell 175 may be directed to a particular type storage operation, such asSRM storage operations. For example, the first storage operation cell175 monitors and performs SRM-related calculations and operationsassociated with primary volume 195 data. Thus, the first storageoperation cell 175 includes a client component 190 in communication witha primary volume 195 storing data. For example, the client 190 may bedirected to using Exchange data, SQL data, Oracle data, or other typesof production data used in business applications or other applicationsand stored in primary volume 195. Storage manager component 200 in cell175 contains SRM modules or other logic directed to monitoring orotherwise interacting with attributes, characteristics, metrics, andother information associated with the data stored in primary volume 195.Storage manager 200 tracks and stores this information and otherinformation in storage manager index 205. For example, in someembodiments, storage manager component 200 tracks the amount ofavailable space and other similar characteristics of data associatedwith primary volume 195. In some embodiments, as further describedherein, storage manager component 200 may also issue alerts or takeother actions when the information associated with primary volume 195satisfies certain criteria, such as alert criteria.

The second storage operation cell 180 may be directed to another typestorage operation, such as HSM storage operations. For example, thesecond storage operation cell 180 may perform backups, migrations,snapshots, or other types of HSM-related operations known in the art.For example, in some embodiments, data is migrated from faster and moreexpensive storage such as magnetic storage to less expensive storagesuch as tape storage.

In some embodiments, storage operation cells may also contain logicalgroupings of the same physical devices. Thus, the second storageoperation cell 180 includes the client component 190 in communicationwith the primary volume 195 storing data, and client component 190 andprimary volume 195 in the second storage operation cell 180 are the samephysical devices as the client component 190 and primary volume 195 inthe first storage operation cell 175. Similarly, in some embodiments,the storage manager component 200 and index 205 in the second storageoperation cell 180 are the same physical devices as the storage managercomponent and index in the first storage operation cell 175. The storagemanager component 200, however, also contains HSM modules or other logicassociated with the second storage operation cell 180 directed toperforming HSM storage operations on primary volume 195 data.

The second storage operation cell 180 therefore also contains a mediamanagement component 210, a media management component index 215, and asecondary storage volume 220 directed to performing HSM-relatedoperations on primary copy data. For example, storage manager 200migrates primary copy data from primary volume 195 to secondary volume220 using media management component 210. Storage manager 200 alsotracks and stores information associated with primary copy migration andother similar HSM-related operations in storage manager index 205. Forexample, in some embodiments, storage manager component 200 directs HSMstorage operations on primary copy data according to according to astorage policy associated with the primary copy 195 and stored in theindex 205. In some embodiments, storage manager 200 also tracks whereprimary copy information is stored, for example in secondary storage220.

The third storage operation cell 185 contains a master storage manager225 and a master storage manager index 230. In some embodiments (notshown), additional storage operation cells might be hierarchicallylocated between the third storage operation cell 185 and the firststorage operation cell 175 or the second storage operation cell 180. Insome embodiments, additional storage operation cells hierarchicallysuperior to the third storage operation cell 185 may also be present inthe hierarchy of storage operation cells.

In some embodiments, the third storage operation cell 185 is alsogenerally directed to performing a type of storage operation, such asintegration of SRM and HSM data from other storage operation cells, suchas the first storage operation cell 175 and the second storage operationcell 180. In other embodiments, the third storage operation cell 185also performs other types of storage operations and might also bedirected to HSM, SRM, or other types of storage operations. In someembodiments, the master storage manager 225 of the third storageoperation cell 185 aggregates and processes network and storage-relateddata provided by other manager components 200 in other storage operationcells 175 and 180 in order to provide, among other information,reporting information regarding particular cells, groups of cell, or thesystem as a whole.

FIG. 4 presents a flow diagram of a method to store chunks of data in asingle instance storage device. The system may receive or generate aninstruction to copy one or more chunks of data to a single instancestorage device, step 235. For example, the system may receive a messageto copy twenty chunks of archive data comprising a client e-mail datastore containing thousands of files in each chunk. As discussed, eachchunk of data may contain payload information representing the dataitems from the client data store as well as metadata describing thecontents of each chunk, storage preferences associated with each chunk,associations between chunks, etc.

The chunk metadata may be separated from the chunk payload at step 240.For example, data pipe modules as further described herein mayunencapsulate the chunk to extract payload information or otherwiseprocess the chunk to separate the chunk metadata from the chunk payload.The chunk metadata may be copied to a data store, for example a storagemanagement component index or a media management component index, andassociated with the chunk payload to be stored in the single instancestorage device as further described herein (step 245). In someembodiments, the chunk metadata may also be copied to a single instancestorage device, but be logically or physically separated (e.g.,maintained as a separate container or data item from the chunk payload).Thus, the chunk payload may be saved as a single instance while themetadata can still be preserved.

Separation of metadata from payload data may be performed in a number ofways. For example, a data agent residing on a client may separate thechunk metadata from the payload data and transmit each portion to asingle instance storage device either separately or together to one ormore destinations. This may also involve, as described herein,transmitting metadata and payload data to different logical or physicallocations, with the appropriate update of management components orindicies to maintain correlation between the payload data and themetadata. Other arrangements may include one or more a media agentsexamining or analyzing chunks to be transmitted and separating metadatafrom payload data at the client or sub-client based on direction from amedia agent, or such separation may occur while the chunk issubstantially in transit, with the media agents routing the payload toone location and the metadata to another. Moreover, such separation mayoccur at the storage device(s) with certain metadata and/or payload datatagged or otherwise deemed suitable for single instance storage andseparated, for example, while queued at the storage device, whereasother metadata and payload data, not suitable for single instancestorage, may be stored elsewhere.

Items from the chunk payload may be copied on an item-by-item basis tothe single instance storage device at step 250. Thus, the chunk payloadmay contain several thousand e-mail messages and attachments, each ofwhich is copied to the single instance storage device for storage. Thesingle instance storage device may generate a suitable identifier, suchas a signature, a cryptographic hash, or other identifier, for eachitem, and store each item as appropriate according to its identifier(e.g., items previously stored are generally not stored again and maysimple be overwritten, new items for which identifiers do not alreadyexist on an index, map, or other tracking means are stored, etc.). Theidentifier may be communicated for each item to a system component, suchas a storage management component or a media management component,responsible for maintaining the index data containing the metadataassociated with the chunk payload (step 255).

In some embodiments, however, certain sections of payload data may notbe suitable for single instance storage. In this case, such data may beseparated from the other chunk data and stored using conventional means.Thus, single instance data and other data from the chunk may be storedin logically or physically separate locations. This allows at leastcertain portions of chunk data to be stored using single instancestorage techniques. A data index, database or other management componentmay keep track of the various locations to facilitate subsequent restoreor copy operations.

The identifier returned for each item from the single instance storagedevice may be associated with the identifier for the item maintained inthe metadata associated with the chunk (step 260). For example, chunkmetadata generally tracks the contents of the payload of each chunk onan item by item basis. A chunk containing a payload of 1000 e-mailmessages and attachments may contain metadata identifying the 1000 itemsof the payload, for example by file name, message index ID, or otheridentifier. Thus, the chunk metadata may maintain logical identifierssuch as file system identifiers or other identifiers corresponding toeach item stored in the chunk payload.

When single instance storage identifiers are returned for each itemprocessed and stored by the single instance storage device, the singleinstance identifiers are associated with the logical identifierspreviously maintained in the chunk metadata. Thus, a map or other indexstructure may be created and maintained in the copy of metadataassociated with the chunk payload items stored in single instancestorage that correlates single instance storage identifiers or physicalstorage identifiers with the logical identifiers maintained in theoriginal chunk metadata prior to single instance storage. For example,the original chunk metadata may contain separate entries for a File Aand a File B which are actually instances of the same copy of data, forexample of the same e-mail attachment. When File A and File B areprocessed by the single instance storage device, they may each generatethe same single instance storage identifier, such as a hash orsignature, and the single instance storage device would know to onlystore one copy or instance of the data. Nevertheless, the singleinstance storage device would still return an identifier for each file.Thus, when File A was sent to the single instance storage device, itssignature would be returned and associated with File A in the chunkmetadata. Similarly, when File B was sent to the single instance storagedevice, the same signature would be returned, but this time associatedwith File B in the chunk metadata. This arrangement allows the singleinstance of the attachment to be referenced by both files rather thanstoring two instances of the same attachment.

Thus, the chunk metadata can still be used to recreate the originalchunk of data or to retrieve various files from the chunk according tothe original chunk metadata. For example, a user or other systemcomponent could be presented with a logical view of the chunk byrecreating a representation of the chunk contents using the chunkmetadata. Thus, although only 600 of 1000 files might be stored insingle instance storage due to multiple instances of data, etc., thesystem could still present a logical view of the chunk containing 1000files (including, for example the same instances of data with differentfile names, etc.). If a user wanted to retrieve a file, as furtherdescribed herein, the system may use the map correlating logicalidentifiers of the original chunk metadata/payload with single instancestorage identifiers to retrieve the requested item from single instancestorage. Similarly, users can also browse or otherwise navigate orinteract with items in the single instance storage device to view itemsactually stored, etc. For example, a user might wish to interact withcontents of a chunk payload containing 1000 files which would normallyuse 1000 MB of storage space on non-single instanced storage and onlyoccupies 500 MB on single instanced storage.

In this case, the user may perform storage operations regarding each ofthe 1000 files according to the 1000 logical identifiers maintained bythe chunk metadata, determine that the 1000 files are only costing thesystem 500 MB in terms of storage (due to their storage on singleinstance storage), understand that the 1000 files are stored as 500files, for example, on the single instance storage device, and alsounderstand that the 1000 files would require 1000 MB of storage space onnon-single instanced storage.

In some embodiments, the system may also support cross-referencingbetween copies of metadata regarding different chunks (step 265). Forexample, the system may cross-reference single instance storageidentifiers to identify duplications among items contained in aplurality of chunks of data in order to more accurately providestorage-related information and perform storage operations. For example,the system may track on a system-wide level across all chunks how muchdata is stored as multiple instances, etc.

FIG. 5 presents a flow diagram of a method for retrieving chunk payloaddata from a single instance storage device. At step 335, the system mayreceive or generate a request to retrieve data from single instancestorage. For example, the system may receive a request to migrate thepayload of a chunk to less expensive storage media such as tape media.

At step 340, the logical identifier of the first item of the chunkpayload as identified by the metadata may be correlated to itscorresponding single instance storage identifier. This item may berequested from the single instance storage device using its singleinstance storage identifier. For example, an item as originallycontained in the chunk payload may have been described in the chunkmetadata as File A whereas File A may be associated with a singleinstance storage signature of 207778627604938. To restore File A, thesystem may request the item having this storage signature from thesingle instance storage device. Other files in the chunk payload mayalso have the same single instance identifier, but will likely havedifferent logical identifiers in the chunk metadata. Thus, each item maybe retrieved from the single instance storage device using its singleinstance storage identifier and then reassociated in a new chunk withits previous logical identifier as further described herein (step 345).

At step 350, the system may consult the chunk metadata to determinewhether additional items remain in the original payload of the chunk. Ifadditional items remain, the system may return to step 340 and the nextitem logically identified by the chunk metadata is retrieved from thesingle instance storage device using its single instance storageidentifier and then reassociated with it previous logical identifier.When no further items remain to be retrieved from the single instancestorage device, the system finishes recreating the chunk byencapsulating all of the items retrieved with the appropriate chunkmetadata (step 355), and copies the new copy of the original chunk tothe desired storage location (step 360).

In some embodiments, the chunk may be recreated on an item by item basisas items are returned from the single instance storage device. In otherembodiments, items are first returned to a buffer or other temporarystorage location until all items are returned and then the chunk isrecreated. Thus, the new copy of the chunk is generally an exact copy ofthe chunk before it was stored in single instance storage, yet themetadata regarding the chunk is preserved for use in future storageoperations.

Systems and modules described herein may comprise software, firmware,hardware, or any combination(s) of software, firmware, hardware, orother means suitable for the purposes described herein. Software andother modules may reside on servers, workstations, personal computers,computerized tablets, PDAs, and other devices suitable for the purposesdescribed herein. Software and other modules may be accessible via localmemory, via a network, via a browser or other application in an ASPcontext, or via other means suitable for the purposes described herein.Data structures described herein may comprise computer files, variables,programming arrays, programming structures, or any electronicinformation storage schemes, methods, or means, or any combinationsthereof, suitable for the purposes described herein. User interfaceelements described herein may comprise elements from graphical userinterfaces, command line interfaces, physical interfaces, and otherinterfaces suitable for the purposes described herein. Screenshotspresented and described herein can be displayed differently as known inthe art to generally input, access, change, manipulate, modify, alter,and work with information.

While the invention has been described and illustrated in connectionwith preferred embodiments, many variations and modifications as will beevident to those skilled in this art may be made without departing fromthe spirit and scope of the invention, and the invention is thus not tobe limited to the precise details of methodology or construction setforth above as such variations and modification are intended to beincluded within the scope of the invention.

1. A method for performing a storage operation on a computer networkcomprising: receiving a request to perform the storage operation on afirst set of data; analyzing the first set of data; characterizing thefirst set of data into a first portion and a second portion based oncharacteristics observed in the analyzing step; copying the firstportion of the data to a first single instance storage location; andassociating an identifier with the first portion of data stored at thefirst single instance storage location.
 2. The method of claim 1 furthercomprising updating a database associated with a management component ofa storage operation cell with the identifier information.
 3. The methodof claim 1 wherein the copying further comprises routing the firstportion of the data to the first single instance storage location via amedia agent.
 4. The method of claim 1 wherein the second portion of thedata is metadata relating to the second portion of data.
 5. The methodof claim 3 wherein the first portion of the data is payload data.
 6. Themethod of claim 1 wherein the first portion of data is copied to thefirst single instance location, based at least in part, on thecharacterization step.
 7. The method of claim 6 wherein the firstportion of the data is copied item by item to the first single instancestorage location.
 8. The method of claim 7 further comprisingassociating an identifier with the first portion of data stored at thefirst single instance storage location.
 9. The method of claim 8 whereinan identifier is associated with the first portion of data stored at thefirst single instance storage location, the method further comprisingcorrelating the identifier associated with the first portion of datawith an identifier associated with the second portion of data.
 10. Themethod of claim 1 wherein the characterization further comprises:characterizing the first portion of data into first and secondsub-portions of data; wherein the first sub-portion of the first portionof data is suitable for single instance storage and the secondsub-portion of the first portion is suitable for conventional storage.11. The method of claim 10 wherein the copying further comprises copyingthe second sub-portion of the first portion of data to a first storagedevice.
 12. The method of claim 10 wherein the copying further comprisescopying the second sub-portion of the first portion of data to a firststorage device.
 13. The method of claim 10 further comprising: copyingthe second sub-portion of the first portion of data to a first storagedevice; copying the second sub-portion of the first portion of data tothe first storage device; and updating a database associated with amanagement component of a storage operation cell to reflect copyoperations associated with the data.
 14. A method for recreating datastored in a storage network comprising: receiving a request to retrievea portion data stored in a storage network; identifying a location ofthe of the portion of data, wherein at least some of the portion of datais located in a single instance storage device; retrieving from thesingle instance storage device data identified in the identifying step;consulting the retrieved data to determine whether additional datarelating to the retrieved data is available; and recreating the dataportion based, at least in part, on data retrieved from the singleinstance storage device.
 15. The method of claim 16 further comprisingretrieving additional data from the single instance storage device if itis determined in the consulting step that additional data relating tothe retrieved data is available.